Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implements weighted shuffle using binary tree #185

Merged
merged 1 commit into from
Mar 23, 2024

Conversation

behzadnouri
Copy link

@behzadnouri behzadnouri commented Mar 11, 2024

This is partial port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

Problem

Though Fenwick trees use less space, inverse queries require an
additional O(log n) factor for binary search resulting an overall
O(n log n log n) performance for weighted shuffle.

Summary of Changes

This commit instead uses a binary tree where each node contains the sum
of all weights in its left sub-tree. The weights themselves are
implicitly stored at the leaves. Inverse queries and updates to the tree
all can be done O(log n) resulting an overall O(n log n) weighted
shuffle implementation.

Based on benchmarks, this results in 24% improvement in
WeightedShuffle::shuffle:

Fenwick tree:

test bench_weighted_shuffle_new     ... bench:      36,686 ns/iter (+/- 191)
test bench_weighted_shuffle_shuffle ... bench:     342,625 ns/iter (+/- 4,067)

Binary tree:

test bench_weighted_shuffle_new     ... bench:      59,131 ns/iter (+/- 362)
test bench_weighted_shuffle_shuffle ... bench:     260,194 ns/iter (+/- 11,195)

Though WeightedShuffle::new is now slower, it generally can be cached
and reused as in Turbine:
https://github.com/anza-xyz/agave/blob/b3fd87fe8/turbine/src/cluster_nodes.rs#L68

Additionally the new code has better asymptotic performance. For
example with 20_000 weights WeightedShuffle::shuffle is 31% faster:

Fenwick tree:

test bench_weighted_shuffle_new     ... bench:     255,071 ns/iter (+/- 9,591)
test bench_weighted_shuffle_shuffle ... bench:   2,466,058 ns/iter (+/- 9,873)

Binary tree:

test bench_weighted_shuffle_new     ... bench:     830,727 ns/iter (+/- 10,210)
test bench_weighted_shuffle_shuffle ... bench:   1,696,160 ns/iter (+/- 75,271)

@behzadnouri behzadnouri force-pushed the weighted-shuffle-tree branch from e5d304a to 9c3461f Compare March 11, 2024 18:15
@behzadnouri behzadnouri changed the title implements weighted shuffle using binary search tree implements weighted shuffle using binary tree Mar 11, 2024
@behzadnouri behzadnouri force-pushed the weighted-shuffle-tree branch from 9c3461f to 9558d6d Compare March 11, 2024 18:50
@codecov-commenter
Copy link

codecov-commenter commented Mar 11, 2024

Codecov Report

Attention: Patch coverage is 99.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 81.9%. Comparing base (b3fd87f) to head (18531c9).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #185   +/-   ##
=======================================
  Coverage    81.9%    81.9%           
=======================================
  Files         837      837           
  Lines      226792   226842   +50     
=======================================
+ Hits       185795   185851   +56     
+ Misses      40997    40991    -6     

@behzadnouri behzadnouri force-pushed the weighted-shuffle-tree branch 2 times, most recently from 8e8e949 to 52e26b9 Compare March 14, 2024 21:29
This is partial port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

Though Fenwick trees use less space, inverse queries require an
additional O(log n) factor for binary search resulting an overall
O(n log n log n) performance for weighted shuffle.

This commit instead uses a binary tree where each node contains the sum
of all weights in its left sub-tree. The weights themselves are
implicitly stored at the leaves. Inverse queries and updates to the tree
all can be done O(log n) resulting an overall O(n log n) weighted
shuffle implementation.

Based on benchmarks, this results in 24% improvement in
WeightedShuffle::shuffle:

Fenwick tree:
    test bench_weighted_shuffle_new     ... bench:      36,686 ns/iter (+/- 191)
    test bench_weighted_shuffle_shuffle ... bench:     342,625 ns/iter (+/- 4,067)

Binary tree:
    test bench_weighted_shuffle_new     ... bench:      59,131 ns/iter (+/- 362)
    test bench_weighted_shuffle_shuffle ... bench:     260,194 ns/iter (+/- 11,195)

Though WeightedShuffle::new is now slower, it generally can be cached
and reused as in Turbine:
https://github.com/anza-xyz/agave/blob/b3fd87fe8/turbine/src/cluster_nodes.rs#L68

Additionally the new code has better asymptotic performance. For
example with 20_000 weights WeightedShuffle::shuffle is 31% faster:

Fenwick tree:
    test bench_weighted_shuffle_new     ... bench:     255,071 ns/iter (+/- 9,591)
    test bench_weighted_shuffle_shuffle ... bench:   2,466,058 ns/iter (+/- 9,873)

Binary tree:
    test bench_weighted_shuffle_new     ... bench:     830,727 ns/iter (+/- 10,210)
    test bench_weighted_shuffle_shuffle ... bench:   1,696,160 ns/iter (+/- 75,271)
@behzadnouri behzadnouri force-pushed the weighted-shuffle-tree branch from 52e26b9 to 18531c9 Compare March 14, 2024 23:41
@behzadnouri behzadnouri requested a review from gregcusack March 14, 2024 23:42
@behzadnouri
Copy link
Author

This is partial port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

The rest is implemented in #259.

Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. is it worth periodically caching the weighted shuffle in push_active_set.rs since it takes longer to build? Although the shuffle op more than makes up for the slower build.

I also tested a few other combinations on my end that worked as expected.

self.arr[k] -= weight;
k += k & k.wrapping_neg();
fn remove(&mut self, k: usize, weight: T) {
self.weight -= weight;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this result in a negative self.weight? or an underflow if self.weight is unsigned

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, unless there is a bug in this code.

self.weight is meant to be sum of all weights.
This function remove is only meant to be called when an item is sampled and its corresponding weight is removed from the tree. Because before sampling the item's weight is included in self.weight, self.weight >= weight.

while index < self.tree.len() {
if val < self.tree[index] {
weight = self.tree[index];
index = (index << 1) + 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible over flow here? but i the tree would have to be insanely big to have any issues. so should be fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

@behzadnouri
Copy link
Author

is it worth periodically caching the weighted shuffle in push_active_set.rs since it takes longer to build?

The following commit #259 will speed up WeightedShuffle::new as well.

@behzadnouri behzadnouri merged commit b6d2237 into anza-xyz:master Mar 23, 2024
33 checks passed
@behzadnouri behzadnouri deleted the weighted-shuffle-tree branch March 23, 2024 13:53
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Mar 25, 2024
This is port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

anza-xyz#185
implemented weighted shuffle using binary tree. Though asymptotically a
binary tree has better performance, compared to a Fenwick tree, it is
less cache local resulting in smaller improvements and in particular
slower WeightedShuffle::new.

In order to improve cache locality and reduce the overheads of
traversing the tree, this commit instead uses a generalized N-ary tree
with fanout of 16, showing significant improvements in both
WeightedShuffle::new and WeightedShuffle::shuffle.

With 4000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:      36,244 ns/iter (+/- 243)
    test bench_weighted_shuffle_shuffle ... bench:     149,082 ns/iter (+/- 1,474)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:      58,514 ns/iter (+/- 229)
    test bench_weighted_shuffle_shuffle ... bench:     269,961 ns/iter (+/- 16,446)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:      39,413 ns/iter (+/- 179)
    test bench_weighted_shuffle_shuffle ... bench:     364,771 ns/iter (+/- 2,078)

The improvements become even more significant as there are more items to
shuffle. With 20_000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:     200,659 ns/iter (+/- 4,395)
    test bench_weighted_shuffle_shuffle ... bench:     941,928 ns/iter (+/- 26,492)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:     881,114 ns/iter (+/- 12,343)
    test bench_weighted_shuffle_shuffle ... bench:   1,822,257 ns/iter (+/- 12,772)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:     276,936 ns/iter (+/- 14,692)
    test bench_weighted_shuffle_shuffle ... bench:   2,644,713 ns/iter (+/- 49,252)
behzadnouri added a commit that referenced this pull request Mar 26, 2024
This is port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

#185
implemented weighted shuffle using binary tree. Though asymptotically a
binary tree has better performance, compared to a Fenwick tree, it has
less cache locality resulting in smaller improvements and in particular
slower WeightedShuffle::new.

In order to improve cache locality and reduce the overheads of
traversing the tree, this commit instead uses a generalized N-ary tree
with fanout of 16, showing significant improvements in both
WeightedShuffle::new and WeightedShuffle::shuffle.

With 4000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:      36,244 ns/iter (+/- 243)
    test bench_weighted_shuffle_shuffle ... bench:     149,082 ns/iter (+/- 1,474)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:      58,514 ns/iter (+/- 229)
    test bench_weighted_shuffle_shuffle ... bench:     269,961 ns/iter (+/- 16,446)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:      39,413 ns/iter (+/- 179)
    test bench_weighted_shuffle_shuffle ... bench:     364,771 ns/iter (+/- 2,078)

The improvements become even more significant as there are more items to
shuffle. With 20_000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:     200,659 ns/iter (+/- 4,395)
    test bench_weighted_shuffle_shuffle ... bench:     941,928 ns/iter (+/- 26,492)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:     881,114 ns/iter (+/- 12,343)
    test bench_weighted_shuffle_shuffle ... bench:   1,822,257 ns/iter (+/- 12,772)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:     276,936 ns/iter (+/- 14,692)
    test bench_weighted_shuffle_shuffle ... bench:   2,644,713 ns/iter (+/- 49,252)
Copy link

mergify bot commented Mar 26, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

mergify bot pushed a commit that referenced this pull request Mar 26, 2024
This is partial port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

Though Fenwick trees use less space, inverse queries require an
additional O(log n) factor for binary search resulting an overall
O(n log n log n) performance for weighted shuffle.

This commit instead uses a binary tree where each node contains the sum
of all weights in its left sub-tree. The weights themselves are
implicitly stored at the leaves. Inverse queries and updates to the tree
all can be done O(log n) resulting an overall O(n log n) weighted
shuffle implementation.

Based on benchmarks, this results in 24% improvement in
WeightedShuffle::shuffle:

Fenwick tree:
    test bench_weighted_shuffle_new     ... bench:      36,686 ns/iter (+/- 191)
    test bench_weighted_shuffle_shuffle ... bench:     342,625 ns/iter (+/- 4,067)

Binary tree:
    test bench_weighted_shuffle_new     ... bench:      59,131 ns/iter (+/- 362)
    test bench_weighted_shuffle_shuffle ... bench:     260,194 ns/iter (+/- 11,195)

Though WeightedShuffle::new is now slower, it generally can be cached
and reused as in Turbine:
https://github.com/anza-xyz/agave/blob/b3fd87fe8/turbine/src/cluster_nodes.rs#L68

Additionally the new code has better asymptotic performance. For
example with 20_000 weights WeightedShuffle::shuffle is 31% faster:

Fenwick tree:
    test bench_weighted_shuffle_new     ... bench:     255,071 ns/iter (+/- 9,591)
    test bench_weighted_shuffle_shuffle ... bench:   2,466,058 ns/iter (+/- 9,873)

Binary tree:
    test bench_weighted_shuffle_new     ... bench:     830,727 ns/iter (+/- 10,210)
    test bench_weighted_shuffle_shuffle ... bench:   1,696,160 ns/iter (+/- 75,271)

(cherry picked from commit b6d2237)
mergify bot added a commit that referenced this pull request Mar 26, 2024
… (#425)

implements weighted shuffle using binary tree (#185)

This is partial port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

Though Fenwick trees use less space, inverse queries require an
additional O(log n) factor for binary search resulting an overall
O(n log n log n) performance for weighted shuffle.

This commit instead uses a binary tree where each node contains the sum
of all weights in its left sub-tree. The weights themselves are
implicitly stored at the leaves. Inverse queries and updates to the tree
all can be done O(log n) resulting an overall O(n log n) weighted
shuffle implementation.

Based on benchmarks, this results in 24% improvement in
WeightedShuffle::shuffle:

Fenwick tree:
    test bench_weighted_shuffle_new     ... bench:      36,686 ns/iter (+/- 191)
    test bench_weighted_shuffle_shuffle ... bench:     342,625 ns/iter (+/- 4,067)

Binary tree:
    test bench_weighted_shuffle_new     ... bench:      59,131 ns/iter (+/- 362)
    test bench_weighted_shuffle_shuffle ... bench:     260,194 ns/iter (+/- 11,195)

Though WeightedShuffle::new is now slower, it generally can be cached
and reused as in Turbine:
https://github.com/anza-xyz/agave/blob/b3fd87fe8/turbine/src/cluster_nodes.rs#L68

Additionally the new code has better asymptotic performance. For
example with 20_000 weights WeightedShuffle::shuffle is 31% faster:

Fenwick tree:
    test bench_weighted_shuffle_new     ... bench:     255,071 ns/iter (+/- 9,591)
    test bench_weighted_shuffle_shuffle ... bench:   2,466,058 ns/iter (+/- 9,873)

Binary tree:
    test bench_weighted_shuffle_new     ... bench:     830,727 ns/iter (+/- 10,210)
    test bench_weighted_shuffle_shuffle ... bench:   1,696,160 ns/iter (+/- 75,271)

(cherry picked from commit b6d2237)

Co-authored-by: behzad nouri <[email protected]>
mergify bot pushed a commit that referenced this pull request Mar 26, 2024
This is port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

#185
implemented weighted shuffle using binary tree. Though asymptotically a
binary tree has better performance, compared to a Fenwick tree, it has
less cache locality resulting in smaller improvements and in particular
slower WeightedShuffle::new.

In order to improve cache locality and reduce the overheads of
traversing the tree, this commit instead uses a generalized N-ary tree
with fanout of 16, showing significant improvements in both
WeightedShuffle::new and WeightedShuffle::shuffle.

With 4000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:      36,244 ns/iter (+/- 243)
    test bench_weighted_shuffle_shuffle ... bench:     149,082 ns/iter (+/- 1,474)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:      58,514 ns/iter (+/- 229)
    test bench_weighted_shuffle_shuffle ... bench:     269,961 ns/iter (+/- 16,446)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:      39,413 ns/iter (+/- 179)
    test bench_weighted_shuffle_shuffle ... bench:     364,771 ns/iter (+/- 2,078)

The improvements become even more significant as there are more items to
shuffle. With 20_000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:     200,659 ns/iter (+/- 4,395)
    test bench_weighted_shuffle_shuffle ... bench:     941,928 ns/iter (+/- 26,492)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:     881,114 ns/iter (+/- 12,343)
    test bench_weighted_shuffle_shuffle ... bench:   1,822,257 ns/iter (+/- 12,772)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:     276,936 ns/iter (+/- 14,692)
    test bench_weighted_shuffle_shuffle ... bench:   2,644,713 ns/iter (+/- 49,252)

(cherry picked from commit 30eecd6)
mergify bot added a commit that referenced this pull request Mar 27, 2024
#429)

implements weighted shuffle using N-ary tree (#259)

This is port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

#185
implemented weighted shuffle using binary tree. Though asymptotically a
binary tree has better performance, compared to a Fenwick tree, it has
less cache locality resulting in smaller improvements and in particular
slower WeightedShuffle::new.

In order to improve cache locality and reduce the overheads of
traversing the tree, this commit instead uses a generalized N-ary tree
with fanout of 16, showing significant improvements in both
WeightedShuffle::new and WeightedShuffle::shuffle.

With 4000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:      36,244 ns/iter (+/- 243)
    test bench_weighted_shuffle_shuffle ... bench:     149,082 ns/iter (+/- 1,474)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:      58,514 ns/iter (+/- 229)
    test bench_weighted_shuffle_shuffle ... bench:     269,961 ns/iter (+/- 16,446)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:      39,413 ns/iter (+/- 179)
    test bench_weighted_shuffle_shuffle ... bench:     364,771 ns/iter (+/- 2,078)

The improvements become even more significant as there are more items to
shuffle. With 20_000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:     200,659 ns/iter (+/- 4,395)
    test bench_weighted_shuffle_shuffle ... bench:     941,928 ns/iter (+/- 26,492)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:     881,114 ns/iter (+/- 12,343)
    test bench_weighted_shuffle_shuffle ... bench:   1,822,257 ns/iter (+/- 12,772)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:     276,936 ns/iter (+/- 14,692)
    test bench_weighted_shuffle_shuffle ... bench:   2,644,713 ns/iter (+/- 49,252)

(cherry picked from commit 30eecd6)

Co-authored-by: behzad nouri <[email protected]>
willhickey pushed a commit to solana-labs/solana that referenced this pull request Mar 29, 2024
#429)

implements weighted shuffle using N-ary tree (#259)

This is port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

anza-xyz#185
implemented weighted shuffle using binary tree. Though asymptotically a
binary tree has better performance, compared to a Fenwick tree, it has
less cache locality resulting in smaller improvements and in particular
slower WeightedShuffle::new.

In order to improve cache locality and reduce the overheads of
traversing the tree, this commit instead uses a generalized N-ary tree
with fanout of 16, showing significant improvements in both
WeightedShuffle::new and WeightedShuffle::shuffle.

With 4000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:      36,244 ns/iter (+/- 243)
    test bench_weighted_shuffle_shuffle ... bench:     149,082 ns/iter (+/- 1,474)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:      58,514 ns/iter (+/- 229)
    test bench_weighted_shuffle_shuffle ... bench:     269,961 ns/iter (+/- 16,446)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:      39,413 ns/iter (+/- 179)
    test bench_weighted_shuffle_shuffle ... bench:     364,771 ns/iter (+/- 2,078)

The improvements become even more significant as there are more items to
shuffle. With 20_000 weights:

N-ary tree (fanout 16):

    test bench_weighted_shuffle_new     ... bench:     200,659 ns/iter (+/- 4,395)
    test bench_weighted_shuffle_shuffle ... bench:     941,928 ns/iter (+/- 26,492)

Binary tree:

    test bench_weighted_shuffle_new     ... bench:     881,114 ns/iter (+/- 12,343)
    test bench_weighted_shuffle_shuffle ... bench:   1,822,257 ns/iter (+/- 12,772)

Fenwick tree:

    test bench_weighted_shuffle_new     ... bench:     276,936 ns/iter (+/- 14,692)
    test bench_weighted_shuffle_shuffle ... bench:   2,644,713 ns/iter (+/- 49,252)

(cherry picked from commit 30eecd6)

Co-authored-by: behzad nouri <[email protected]>
anwayde pushed a commit to firedancer-io/agave that referenced this pull request Jul 23, 2024
…a-xyz#185) (anza-xyz#425)

implements weighted shuffle using binary tree (anza-xyz#185)

This is partial port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c

Though Fenwick trees use less space, inverse queries require an
additional O(log n) factor for binary search resulting an overall
O(n log n log n) performance for weighted shuffle.

This commit instead uses a binary tree where each node contains the sum
of all weights in its left sub-tree. The weights themselves are
implicitly stored at the leaves. Inverse queries and updates to the tree
all can be done O(log n) resulting an overall O(n log n) weighted
shuffle implementation.

Based on benchmarks, this results in 24% improvement in
WeightedShuffle::shuffle:

Fenwick tree:
    test bench_weighted_shuffle_new     ... bench:      36,686 ns/iter (+/- 191)
    test bench_weighted_shuffle_shuffle ... bench:     342,625 ns/iter (+/- 4,067)

Binary tree:
    test bench_weighted_shuffle_new     ... bench:      59,131 ns/iter (+/- 362)
    test bench_weighted_shuffle_shuffle ... bench:     260,194 ns/iter (+/- 11,195)

Though WeightedShuffle::new is now slower, it generally can be cached
and reused as in Turbine:
https://github.com/anza-xyz/agave/blob/b3fd87fe8/turbine/src/cluster_nodes.rs#L68

Additionally the new code has better asymptotic performance. For
example with 20_000 weights WeightedShuffle::shuffle is 31% faster:

Fenwick tree:
    test bench_weighted_shuffle_new     ... bench:     255,071 ns/iter (+/- 9,591)
    test bench_weighted_shuffle_shuffle ... bench:   2,466,058 ns/iter (+/- 9,873)

Binary tree:
    test bench_weighted_shuffle_new     ... bench:     830,727 ns/iter (+/- 10,210)
    test bench_weighted_shuffle_shuffle ... bench:   1,696,160 ns/iter (+/- 75,271)

(cherry picked from commit b6d2237)

Co-authored-by: behzad nouri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants